The Role of Lexical Resources in CJK Natural Language Processing

نویسنده

  • Jack Halpern
چکیده

The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography, especially in Japanese. This paper summarizes some of the major linguistic issues in the development NLP applications that are dependent on lexical resources, and discusses the central role such resources should play in enhancing the accuracy of NLP tools.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Contribution of Lexical Resources to Natural Language Processing of CJK Languages

The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, e...

متن کامل

Exploiting Lexical Resources for Disambiguating CJK and Arabic Orthographic Variants

The orthographical complexities of Chinese, Japanese, Korean (CJK) and Arabic pose a special challenge to developers of NLP applications. These difficulties are exacerbated by the lack of a standardized orthography in these languages, especially the highly irregular Japanese orthography and the ambiguities of the Arabic script. This paper focuses on CJK and Arabic orthographic variation and pro...

متن کامل

First Language Activation during Second Language Lexical Processing in a Sentential Context

 Lexicalization-patterns, the way words are mapped onto concepts, differ from one language      to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...

متن کامل

Processing of Lexical Bundles by Persian Speaking Learners of English

Formulaic sequence (FS) is a general term often used to refer to various types of recurrent clusters. One particular type of FSs common in different registers is lexical bundles (LBs). This study investigated whether LBs are stored and processed as a whole in the mind of language users and whether their functional discourse type has any effect on their processing. To serve these objectives, thr...

متن کامل

The Role of Private Speech Produced by Intermediate EFL Learners in Lexical Language Related Episodes

Private speech utilization is accepted to have a critical role in the continuum of language acquisition. As a valuable device in studying learners’ talk during interaction, a language related episode (LRE) is any part of a dialogue where a student speaks about a language problem s/he comes across while completing a task. The present study investigated the role of private speech produced by Inte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006